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Abstract 

The  researchers  made  significant  progress  in  all  of  the  proposed  research  areas.  The  first  major  task  in 
the  proposal  involved  model-based  randomized  methods  for  global  optimization.  In  support  of  this  task,  the 
researchers  developed  new  methods  for  stochastic  derivative  estimators  for  discontinuous  payoff  functions; 
the  method  includes  Infinitesimal  Perturbation  Analysis  and  the  Likelihood  Ratio  method  as  special  cases 
and  can  be  applied  to  functions  of  more  general  forms  containing  indicator  functions.  The  researchers  de¬ 
veloped  a  new  method  of  distributed  ordinal  comparison  of  selecting  the  best  option,  which  maximizes  the 
average  of  local  reward  function  values  among  available  options  in  a  dynamic  network,  they  discovered  a 
new  innovative  approach  to  simulation-based  global  optimization  by  building  a  connection  between  global 
optimization  and  evolutionary  games,  as  well  as  another  new  approach  that  exploits  particle  filtering;  they 
have  summarized  our  model-based  results  in  a  comprehensive  survey  paper.  The  researchers  also  made 
significant  progress  in  other  model-based  randomized  methods,  including  a  stochastic  search  algorithm  for 
solving  general  optimization  problems  with  little  structure;  the  algorithm  iteratively  finds  high  quality  solu¬ 
tions  by  randomly  sampling  candidate  solutions  from  a  parameterized  distribution  model  over  the  solution 
space. 

In  support  of  the  second  task,  the  researchers  made  progress  incorporating  simulation-based  and  sam¬ 
pling  methods  into  Markov  Decision  Processes  (MDPs).  They  made  significant  progress  in  new  simulation- 
based  approaches  to  MDPs,  and  in  applications  to  problems  of  supply  chains  and  finance.  In  particular,  the 
researchers  have  developed  a  simulation-based  algorithm  called  Approximate  Stochastic  Annealing  (ASA) 
for  solving  finite  horizon  MDPs. 

In  addition,  the  researchers  developed  a  theory  and  new  methods  for  solving  dynamic  stochastic  opti¬ 
mization  problems  with  non-convex  risk-sensitive  performance  measures,  as  well  as  two  new  methods  for 
simulation  optimization. 
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1  Introduction 

In  this  research  project,  we  proposed  to  investigate  basic  questions  aimed  at  challenges  in  information  su¬ 
periority,  logistics,  and  planning  for  the  Air  Force  of  the  future.  In  particular,  we  proposed  to  investigate 
simulation-based  methodologies  for  global  optimization  and  planning  that  can  be  effective  tools  in  an  in¬ 
tegrated  approach  to  Command  and  Control  (C2),  planning,  and  logistics.  The  questions  we  investigated 
were  motivated  by  future  Air  Force  requirements,  which  will  involve  a  flexible  and  world-responsive  set 
of  missions.  More  agile,  responsive,  and  integrated  systems  will  be  required.  The  aim  of  the  research  was 
to  facilitate  Air  Force  decision  making  at  many  levels  of  operations,  particularly  at  the  mission,  campaign, 
and  strategic  levels.  The  proposed  simulation-based  methodologies  were  intended  to  provide  approaches  to 
optimization  and  comparison  of  Alternative  Courses  of  Action  in  Air  Force  simulation-based  models. 

Such  problems  and  systems  arc  exceedingly  complex;  in  order  to  solve  them,  we  focused  on  using 
simulation-based  methods  for  global  optimization  and  sequential  decision  making  under  uncertainty. 
In  particular,  we  combined  three  approaches  in  the  study  of  such  problems: 

•  Developing,  studying,  and  analyzing  efficient  simulation-based  and  sampling  methodologies  for  global 
optimization  problems; 

•  Developing  and  studying  efficient  parallel  simulation-based  and  sampling  methodologies  for  problems 
of  dynamic  decision  making  under  uncertainty; 

•  Studying  the  application  of  these  optimization  methodologies  to  practical  problems,  such  as  preven¬ 
tive  maintenance,  path  planning  for  unmanned  aerial  vehicles,  and  financial  engineering. 

1.1  Model-Based  Global  Optimization 

For  a  a  bounded  deterministic  measurable  function  H  :  3£  — »•  S%,  where  222  is  the  feasible  solution  space, 
the  optimization  problem  is  to  find 


x*  €  argma xH(x).  (1) 

X€& 

It  is  common  in  many  situations  to  introduce  a  measurable  strictly  increasing  fitness  function,  0  :  &  ->  ^+, 
and  reformulate  (1)  as 


x*  €  arg  max  0  (H(x)  ) ,  (2) 

which  guarantees  the  range  of  the  new  fitness-objective  function  will  always  be  non-negative. 

Model-based  global  optimization  methods  use  probability  distributions  to  weight  promising  areas  of 
the  solution  space,  where  the  distribution  is  updated  iteratively  based  on  output  from  the  samples  drawn 
according  to  the  current  distribution.  They  arc  well  suited  for  global  optimization  problems  where  there  is 
limited  structural  information  on  the  optimizing  function  (e.g.,  derivatives,  convexity). 

1.2  Simulation-Based  and  Sampling  Methods  for  Markov  Decision  Processes 

Simulation  optimization  problems  arising  in  supply  chain  management,  path  planning  for  unmanned  aerial 
vehicles,  financial  engineering,  and  telecommunications  arc  characterized  by  two  critical  aspects:  changing 
dynamics  and  stochastic  events.  For  example,  effective  supply  chain  management  requires  optimal  respon¬ 
sive  actions  in  the  face  of  both  gradual  shifts  in  demand  patterns  (e.g.,  due  to  technology  advances)  and 
sudden  unpredictable  disruptions  in  production  capacity  (e.g.,  due  to  an  unanticipated  manufacturing  facil¬ 
ity  shutdown).  Such  systems  often  require  computationally  expensive  simulation  models  for  performance 
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estimation,  such  as  modeling  the  operations  of  an  entire  semiconductor  fabrication  facility,  where  simulation 
runtime  is  typically  on  the  order  of  hours.  Markov  decision  processes  (MDPs)  provide  a  powerful  paradigm 
for  modeling  optimal  decision  making  under  uncertainty  in  these  settings,  but  MDPs  suffer  from  the  well- 
known  curse  of  dimensionality,  which  can  include  exponential  growth  in  the  size  of  state  spaces  and  action 
spaces  with  the  problem  size;  thus,  direct  numerical  solution  of  MDPs  for  large-scale  real-world  problems 
presents  a  formidable  computational  challenge.  In  general,  heuristics  and  approximations  arc  employed 
to  simplify  the  MDP  model.  Perhaps  the  most  successful  example  of  this  approach  has  been  approximate 
dynamic  programming  using  value  function  approximation  (cf.  [4,  16,  29,  37,  40]). 

The  Pis  have  developed  evolutionary  and  simulation-based  algorithms  that  provide  new  advances  in  the 
solution  of  MDPs  [9].  These  advances  include  the  following  approaches:  adaptive  multi-stage  sampling  — 
well-suited  for  large  state  spaces  but  relatively  smaller  action  spaces;  and  population-based  evolutionary 
randomized  policy  search  —  designed  to  handle  large  action  spaces.  The  Pis  have  developed  many  such 
algorithms  [7,  8,  9,  10,  12,  23]. 

2  Research  Results 

2.1  Model-Based  Global  Optimization 

We  distinguish  between  instance-based  and  model-based  global  optimization  solution  methods.  In  instance- 
based  methods,  the  search  for  new  candidate  solutions  depends  directly  on  previously  generated  solutions, 
e.g.,  simulated  annealing,  genetic  algorithms  (GAs),  tabu  search,  and  nested  partitions.  On  the  other  hand,  in 
model-based  algorithms,  new  candidate  solutions  arc  generated  via  an  intermediate  probability  model  that  is 
iteratively  updated.  Our  research  has  focused  on  the  model-based  optimization  framework,  which  involves 
the  following  ingredients: 

(0)  specify  probability  distribution  over  solution  space; 

(I)  generate  candidate  solutions  by  sampling  from  distribution; 

(II)  estimate  performance  of  (and  possibly  improve)  candidate  solutions; 

(III)  update  distribution  based  on  selected  (“elite”)  set  of  candidate  solutions. 

This  approach  retains  the  primary  strengths  of  population-based  approaches  such  as  genetic  algorithms 
—  improving  upon  simulated  annealing,  which  works  with  a  single  iterate  at  a  time,  while  at  the  same  time 
providing  more  flexibility  in  exploring  the  entire  solution  space,  introducing  more  structure  in  the  search 
procedure,  and  allowing  theoretical  properties  to  be  studied  regarding  both  finite-time  performance  and 
asymptotic  convergence.  The  theory  behind  the  framework  is  rigorous,  but  based  on  an  idealized  version 
of  the  last  three  ingredients,  specifically  the  distribution  sequence,  sampling  from  the  distribution  sequence 
(or  from  a  surrogate  or  approximation),  and  estimation  of  the  performance,  since  it  is  observed  through 
simulation.  Schematically,  we  seek  a  sequence  of  distributions 

<1>0>  <1>  1 1  §2  j  ■  ■  •  t 

where  concentrates  its  mass  around  the  optimal  solutions. 

Examples  for  the  sequence  of  distributions  {gjJ  include  the  instantiation  of  our  model  reference  adaptive 
search  (MRAS)  method  in  [22] . 

However,  the  sequence  {g^}  is  unknown  explicitly  a  priori,  or  else  the  problem  would  essentially  be 
solved.  Our  MRAS  approach  uses  a  projection  onto  distributions  that  are  easy  to  work  with,  e.g.,  uses 
a  family  of  parameterized  distributions  {fe},  and  projects  gk  onto  the  family  to  obtain  a  sequence  that 
converges  to  the  (final)  target  distribution,  i.e., 

fdoifOi  if 9 2  ’  **  *  ^ 

a  common  implementation  minimizes  the  Kullback-Leibler  (KL)  divergence  between  fgk  and  g/(  at  each 
iteration,  because  it  leads  to  analytically  tractable  solutions  if  the  parameterized  distributions  are  from  the 
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exponential  family.  This  leads  to  a  population  of  candidate  solutions,  from  which  an  elite  set  is  selected  and 
used  to  update  the  distribution. 

In  [42],  we  propose  a  new  framework  for  continuous  global  optimization  problems  by  building  a  con¬ 
nection  between  global  optimization  problems  and  evolutionary  games,  and  we  show  that  a  particular  equi¬ 
librium  set  of  the  evolutionary  game  is  asymptotically  stable.  Based  on  this  connection,  we  propose  a 
Model-based  Evolutionary  Optimization  (MEO)  algorithm,  which  uses  probabilistic  models  to  generate 
new  candidate  solutions  and  uses  dynamics  from  evolutionary  game  theory  to  govern  the  evolution  of  the 
probabilistic  models.  The  MEO  algorithm  also  gives  new  insight  into  the  mechanism  of  model  updating  in 
model-based  global  optimization  algorithms.  Based  on  the  MEO  algorithm,  a  Population  Model-based  Evo¬ 
lutionary  Optimization  (PMEO)  algorithm  is  proposed,  which  captures  the  multimodal  property  of  global 
optimization  problems.  Simulation  experiments  demonstrate  the  effectiveness  of  the  proposed  algorithm. 

We  study  in  [26]  a  class  of  random  sampling-based  algorithms  for  solving  general  non-differentiable 
optimization  problems.  These  arc  iterative  approaches  that  arc  based  on  sampling  from  and  updating  an 
underlying  distribution  function  over  the  set  of  feasible  solutions.  In  particular,  we  propose  a  novel  and 
systematic  framework  to  investigate  the  convergence  and  asymptotic  convergence  rates  of  these  algorithms 
by  exploiting  their  connections  to  the  well-known  stochastic  approximation  (SA)  method.  Such  an  SA 
framework  unities  our  understanding  of  these  randomized  algorithms  and  provides  new  insight  into  their 
design  and  implementation  issues.  Our  preliminary  numerical  experiments  indicate  that  new  implementa¬ 
tions  of  these  algorithms  based  on  the  proposed  framework  may  lead  to  improved  performance  over  existing 
procedures. 

The  Annealing  Adaptive  Search  (AAS)  algorithm  for  global  optimization  searches  the  solution  space 
by  sampling  from  a  sequence  of  Boltzmann  distributions.  For  a  class  of  optimization  problems,  it  has  been 
shown  that  the  complexity  of  AAS  increases  at  most  linearly  in  the  problem  dimension.  However,  despite 
its  desirable  property,  sampling  from  a  Boltzmann  distribution  at  each  iteration  of  the  algorithm  remains 
a  practical  challenge.  Prior  work  to  address  this  issue  has  focused  on  embedding  Markov  chain-based 
sampling  techniques  within  the  AAS  framework.  In  [24,  25],  based  on  ideas  from  the  recent  Cross-Entropy 
method  and  our  Model  Reference  Adaptive  Search  method,  we  propose  an  algorithm,  called  Model-based 
Annealing  Random  Search  (MARS),  that  complements  prior  work  by  sampling  solutions  from  a  sequence 
of  surrogate  distributions  that  iteratively  approximate  the  target  Boltzmann  distributions.  We  establish  a 
novel  connection  between  MARS  and  the  well-known  Stochastic  Approximation  method.  By  exploiting 
this  connection,  we  prove  the  global  convergence  of  MARS  and  characterize  its  asymptotic  convergence 
rate  behavior.  Our  empirical  results  indicate  promising  performance  of  the  algorithm  in  comparison  with 
some  of  the  existing  methods. 

The  paper  [45]  presents  a  novel  interpretation  to  transform  an  optimization  problem  into  a  filtering  prob¬ 
lem,  where  the  goal  is  to  compute  the  conditional  distribution  of  the  unobserved  state  given  the  observation 
history.  We  prove  that  in  our  formulation  the  conditional  distribution  converges  asymptotically  to  a  degen¬ 
erate  distribution  concentrated  on  the  global  optimum.  Hence,  the  goal  of  searching  for  the  global  optimum 
can  be  achieved  by  computing  the  conditional  distribution  sequentially.  That  is  done  through  the  application 
of  particle  filtering,  a  class  of  sequential  Monte  Carlo  methods  for  filtering,  which  has  proven  convergence 
in  tracking  the  conditional  distribution.  The  resultant  algorithmic  framework  unifies  some  recent  random¬ 
ized  optimization  algorithms  as  well  as  providing  new  insights  into  their  connection.  More  importantly,  the 
framework  opens  up  the  possibility  of  new  improved  algorithms.  In  particular,  we  develop  a  new  improved 
cross-entropy  method  under  this  framework,  and  the  numerical  results  show  that  our  method  is  very  effective 
in  preventing  premature  convergence  of  the  Cross-Entropy  method. 

The  paper  [21]  aims  to  improve  the  sampling  efficiency  of  model-based  methods  for  global  optimization 
by  considering  a  generalization  where  a  population  of  distribution  models  is  maintained  and  subsequently 
propagated  from  generation  to  generation.  A  key  issue  in  the  proposed  approach  is  how  to  efficiently  allocate 
the  sampling  budget  among  the  population  of  models  to  maximize  the  algorithm  performance.  We  formulate 
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this  problem  as  a  generalized  max  k-armed  bandit  problem,  and  derive  an  efficient  dynamic  sample  alloca¬ 
tion  scheme  based  on  Markov  decision  theory  to  adaptively  allocate  computational  resources.  The  proposed 
allocation  scheme  is  then  further  used  to  update  the  current  population  to  produce  an  improving  population 
of  models.  Our  preliminary  numerical  results  indicate  that  the  proposed  procedure  may  considerably  reduce 
the  number  of  function  evaluations  needed  to  obtain  high  quality  solutions,  and  thus  further  enhance  the 
value  of  model-based  methods  for  optimization  problems  that  require  expensive  function  evaluations  for 
performance  evaluation. 

We  propose  in  [15]  to  improve  the  efficiency  of  simulation  optimization  by  integrating  the  notion  of 
optimal  computing  budget  allocation  into  the  Cross-Entropy  (CE)  method.  This  paper  focuses  on  continuous 
optimization  problems.  In  the  stochastic  simulation  setting  where  replications  are  expensive  but  noise  in 
the  objective  function  estimate  could  mislead  the  search  process,  the  allocation  of  simulation  replications 
can  make  a  significant  difference  in  the  performance  of  such  global  optimization  search  algorithms.  A 
new  allocation  scheme  is  developed  based  on  the  notion  of  optimal  computing  budget  allocation.  The 
proposed  approach  improves  the  updating  of  the  sampling  distribution  by  carrying  out  this  computing  budget 
allocation  in  an  efficient  manner,  by  minimizing  the  expected  mean-squared  error  of  the  CE  weight  function. 
Numerical  experiments  indicate  that  the  computational  efficiency  of  the  CE  method  can  be  substantially 
improved  if  the  ideas  of  computing  budget  allocation  are  applied. 

We  have  also  made  considerable  advances  in  applying  our  sampling  and  model-based  framework  and 
related  techniques  to  various  applications.  For  example,  the  assessment  of  dose-response  is  an  integral 
component  of  the  drug  development  process.  Parallel  dose-response  studies  arc  conducted,  customarily,  in 
preclinical  and  phase  1,  2  clinical  trials  for  this  purpose.  Practical  constraints  on  dose  range,  dose  levels  and 
dose  proportions  arc  intrinsic  issues  in  the  design  of  dose  response  studies  because  of  drug  toxicity,  efficacy, 
FDA  regulations,  protocol  requirements,  clinical  trial  logistics,  and  marketing  issues.  We  have  developed 
(see  [27])  a  free  on-line  software  package  called  Controlled  Optimal  Design  2.0  for  generating  controlled 
optimal  designs  that  can  incorporate  prior  information  and  multiple  objectives,  and  meet  multiple  practical 
constraints  at  the  same  time.  Researchers  can  either  run  the  web-based  design  program  or  download  its 
stand-alone  version  to  construct  the  desired  multiple -objective  controlled  Bayesian  optimal  designs.  Be¬ 
cause  researchers  often  adopt  ad-hoc  design  schemes  such  as  the  equal  allocation  rules  without  knowing 
how  efficient  such  designs  would  be  for  the  design  problem,  the  program  also  evaluates  the  efficiency  of 
user-supplied  designs. 

In  another  application  of  simulation  optimization,  we  have  studied  a  finance  problem  in  [5].  Assuming 
the  underlying  assets  follow  a  Variance-Gamma  (VG)  process,  we  consider  the  problem  of  estimating  sen¬ 
sitivities  such  as  the  Greeks  on  a  basket  of  stocks  when  Monte  Carlo  simulation  is  employed.  We  focus  on  a 
class  of  derivatives  called  mountain  range  options,  comparing  indirect  methods  (finite  difference  techniques 
such  as  forward  differences)  and  two  direct  methods:  infinitesimal  perturbation  analysis  (IPA)  and  the  like¬ 
lihood  ratio  (LR)  method,  where  the  latter  is  also  implemented  via  a  recently  proposed  numerical  technique 
developed  by  Glasserman  and  Liu  using  the  characteristic  function.  We  carry  out  numerical  simulation  ex¬ 
periments  to  evaluate  the  efficiency  of  the  different  estimators  and  discuss  the  strengths  and  weakness  of 
each  method. 

Motivated  by  IPA  and  the  LR  method,  we  derive  in  [44,  43]  a  new  unbiased  stochastic  derivative  estima¬ 
tor  for  a  class  of  discontinuous  payoff  functions  that  arise  in  many  options  pricing  settings  from  finance.  Our 
method  includes  IPA  and  the  LR  method  as  special  cases  and  can  be  applied  to  functions  of  more  general 
forms  containing  indicator  functions.  This  new  estimator  can  be  computed  from  a  single  sample  path  or  sim¬ 
ulation,  whereas  existing  estimators  in  the  literature  require  additional  simulations.  We  apply  this  method  to 
sensitivity  analysis  for  European  call  options  and  American  style  call  options.  For  pricing  American-style 
derivatives  using  a  gradient-based  stochastic  approximation  algorithm,  numerical  experiments  indicate  that 
the  estimator  is  computationally  more  efficient  than  other  estimators  in  the  literature. 
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In  [9],  we  analyze  a  call  center  with  multiple  customer  types  and  dynamic  priority  service  discipline,  in 
which  a  low-priority  customer  becomes  high  priority  when  its  waiting  time  exceeds  a  given  deterministic 
service  level  threshold.  Within  each  priority  queue,  the  service  discipline  is  first  come,  first  served.  Based 
on  a  fluid  approximation  of  the  system,  we  apply  infinitesimal  perturbation  analysis  (IPA)  to  derive  estima¬ 
tors  for  the  derivatives  of  the  queue  lengths  with  respect  to  the  threshold  parameter.  Numerical  examples 
illustrate  the  validity  of  the  fluid  model  approximation  and  the  accuracy  of  the  IPA  estimators. 

In  [11],  we  consider  distributed  ordinal  comparison  of  selecting  the  best  option,  which  maximizes  the 
average  of  local  reward  function  values  among  available  options  in  a  dynamic  network.  Each  node  in  the 
network  knows  only  its  reward  function,  and  edge-connectivity  across  the  nodes  changes  over  time  according 
to  Calafiores  model.  To  estimate  each  options  global  reward  function  value,  local  samples  for  each  option 
are  generated  at  each  node,  and  those  arc  iteratively  mixed  over  the  network  by  a  weighted  average  of  local 
estimates  of  instantaneous  neighbors.  Each  node  selects  an  option  that  achieves  the  maximum  of  the  current 
global  estimates  as  an  estimate  of  the  best  option.  We  establish  a  lower  bound  on  the  probability  of  correct 
local-selection  at  any  node,  which  uniformly  converges  over  the  nodes  to  a  lower  bound  on  the  probability 
of  correct  global-selection  by  a  virtual  centralized  node  with  the  total  available  samples. 

In  [47,  46],  we  propose  a  stochastic  search  algorithm  for  solving  general  optimization  problems  with 
little  structure.  The  algorithm  iteratively  finds  high  quality  solutions  by  randomly  sampling  candidate  so¬ 
lutions  from  a  parameterized  distribution  model  over  the  solution  space.  The  basic  idea  is  to  convert  the 
original  (possibly  non-continuous,  non-differentiable)  problem  into  a  differentiable  optimization  problem 
on  the  parameter  space  of  the  parameterized  sampling  distribution,  and  then  use  a  direct  gradient  search 
method  to  find  improved  sampling  distributions.  Thus,  the  algorithm  combines  the  robustness  feature  of 
stochastic  search  from  considering  a  population  of  candidate  solutions  with  the  relative  fast  convergence 
speed  of  classical  gradient  methods  by  exploiting  local  differentiable  structures.  We  analyze  the  convergence 
and  convergence  rate  properties  of  the  proposed  algorithm,  and  carry  out  a  numerical  study  to  illustrate  its 
performance. 

2.2  Simulation  Optimization 

2.2.1  Direct  Gradient  Augmented  Regression  (DiGAR) 

The  classical  linear  regression  model  takes  the  form: 

>’(■  =  PlXil  +  Aa<2  4 - h  fidxid  +  £(  =  XJ  P  +  £/, 

where  x,-  =  (x, i , xq ,  •  •  •  ,x/rj)T  e  M.d  and  P  is  to  be  estimated  from  the  data.  The  Direct  Gradient  Augmented 
Regression  (DiGAR)  model  in  [18]  adds  the  gradient  in  an  intuitive  manner: 

}'i  =  xJfi  +  Ei,  (3) 

Si  =  P  +  A?  (4) 

where  y/  and  g(,  i  =  1,2, ...  ,k  arc  the  performance  measures  and  gradient  estimates  with  residuals  {£,}  and 
{5,},  respectively.  Since  (4)  is  also  a  linear  relationship,  it  could  in  principle  be  combined  with  (3)  to  obtain 
the  traditional  model  of  the  same  form  but  with  higher  dimension  (see  Fu  and  Qu  [18,  32]),  but  at  a  loss 
of  the  intuition  that  will  become  evident  by  keeping  them  separate.  For  illustrative  purposes,  consider  the 
one-dimensional  problem,  i.e.,  the  given  data  points  arc  (jci,yi,gi),. . .,  (jc* , )  andXjS  =  /Jo  +.v/J| .  Using 
ordinary  least  squares,  the  function  to  be  minimized  is  the  sum  of  the  squared  deviations  in  both  v;  and  gy. 

L=i  (yt  -  A)  -  A  Xif  + 1  &  -  Pi  )2-  (5) 

i—  1  i—  1 

Here,  for  simplicity,  the  two  components  arc  equally  weighted;  [18]  includes  the  general  convex  combination 
case.  Denoting  j3- P  and  /i/%  i  =  0,1,  as  the  respective  DiGAR  and  classical  linear  regression  estimators,  the 
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resulting  estimators  that  minimize  (5)  arc 


A?  =  y-tfz, 


P?  = 


I  {xi~x){yi-y)  +  kg 
i=  1 


L  ( xi~x)2+k 
i—  1 

whereas  the  estimators  in  classical  linear  regression  are 


L  (xi-x)(yi-y) 

Po  =  y  -  Ph,  AL  =  ^-k - ,  (8) 

I  (Xi-x)2 
i=  1 

where  x,  y  and  g  are  the  corresponding  sample  means  of  x,-,  y/  and  g,.  Note  that  in  the  DiGAR  model,  the 
form  of  the  intercept  estimator  given  by  (6)  remains  unchanged,  whereas  the  slope  estimator  given  by  (7) 
has  the  additional  terms  ng  and  n  in  the  numerator  and  denominator,  respectively,  reflecting  the  additional 
direct  gradient  information. 

Assumption  1. 

i)  The  estimators  for  the  responses  and  gradients  are  unbiased ,  i.e.,  E(Sj)  =  £(£,-)  =  0. 

ii)  All  the  residuals  are  uncorrelated,  i.e.,  Cov  (£,■.£/)  =  0,  Co v ( <5, ,  8j )  =  0,  Cov(£,-,  Sj)  =  0,  V  i,j. 

The  quality  of  the  slope  estimator  is  critical  for  use  in  sequential  RSM,  as  it  provides  the  basis  for  the 
search  direction.  Denoting  the  respective  response  and  gradient  residual  variances  by  Var(£,)  =  a 2  and 
Var(A)  =  <T7-  the  following  result  provides  one  simple  sufficient  condition  under  which  the  variance  of  the 
slope  estimator  is  smaller  for  the  DiGAR  model. 

Proposition  1.  Under  Assumption  1,  the  variance  of  DiGAR  estimator  Varf/J  f)  <  Var(j8jL)  if  ag  <C  c2, 
where 

k  7  7 

k  +  Yj  xf  —  kx~ 


£  xf  —  kx 2 

i=l 

Since  C  >  1 ,  as  long  as  the  gradient  estimate  is  not  too  much  noisier  than  the  variance  of  the  performance 
estimate,  the  DiGAR  slope  estimator  is  guaranteed  to  provide  statistical  improvement. 

Assumption  2.  The  residuals  are  normally  distributed,  i.e.,  £,  ~  (0,  <72  ).  S,  ~  ./C (0.  cr2). 

Assumptions  1  and  2  imply  that  the  performance  and  gradient  estimates  arc  independent  due  to  the 
residuals  being  uncorrelated,  and  the  likelihood  function  is  given  by 


L(Po,Pi,o2,Vg)  =  (2n)  k{ocg)  AexP|~^2  Po~Pixi)2- J^igt-piY 

which  leads  to  the  following  respective  maximum  likelihood  estimators  (MLEs)  for  A)  and  A  : 


Pg=y-Pix,  P?  = 


■h  L  xffi  +  4  g~  4xy 

i=  1  * 

k  ’ 

_L  v  y2  - A _ L  v2 

rr2  L,  Xj  -r  n2  rrlX 


for  which  theoretical  superiority  of  the  slope  estimator  can  also  be  established  (proof  in  [18]). 
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A  „  A  - 

Proposition  2.  Under  Assumptions  1  and  2,  the  MLE  /3,  in  (9)  has  smaller  variance  than  /3j  . 

These  results  provide  theoretical  support  for  the  intuition  that  the  availability  of  direct  gradient  esti¬ 
mates  is  beneficial.  However,  in  most  practical  applications,  the  most  critical  assumptions  arc  generally  not 
satisfied,  e.g.,  direct  gradient  estimates  are  generally  correlated  with  and  less  precise  than  the  performance 
estimates.  Furthermore,  the  implicit  assumption  that  the  variance  is  constant  over  the  input  parameter  space 
is  also  violated.  Even  so,  preliminary  simulation  experiments  with  queueing  systems  indicate  that  the  con¬ 
clusions  arc  robust  to  violations  in  the  assumptions,  i.e.,  substantial  gains  arc  observed  using  the  DiGAR 
models.  The  following  example  illustrates  typical  results  that  have  been  observed. 

Queueing  Example 

A  single-server,  first-come,  first-served  queue  with  Poisson  arrivals  and  i.i.d.  exponentially  distributed  ser¬ 
vice  times  (i.e.,  an  M/M/1  queue)  is  considered,  where  the  arrival  rate  is  fixed  (at  0.2).  The  performance 
measure  of  interest  is  the  expected  system  time.  For  simplicity,  the  queue  is  assumed  to  start  empty,  and  the 
output  performance  is  the  expected  service  time  of  the  ith  customer,  denoted  by  yilK  which  can  be  calculated 
analytically  as  a  function  of  the  mean  service  time,  denoted  by  x,  for  the  purposes  of  evaluating  the  quality 
of  the  fits  of  the  different  regression  models.  For  this  and  many  queueing  examples,  direct  gradient  estimates 
arc  available  using  IPA,  FR/SF,  and  WD;  for  the  numerical  results  reported  here,  the  IPA  estimator  is  used, 
as  it  has  the  lowest  variance  in  this  setting.  To  compare  with  traditional  regression,  three  DiGAR  models  arc 
considered:  uncorrelated,  independent  Gaussian,  and  a  correlated  Gaussian  model  to  be  described  shortly. 
The  simulated  data,  true  model  and  fitted  models  (DiGAR  =  uncorrelated,  DiGARn  =  independent  normal, 
DiGAR*  =  correlated  normal)  arc  plotted  in  Figure  1,  along  with  the  10  data  points  (atx  =  3.6, 3.7, . . . ,  4.5), 
which  themselves  arc  sample  means  based  on  10  replications.  All  methods  fit  the  model  reasonably  well  for 
the  2nd  and  3rd  customers,  but  there  arc  dramatic  differences  in  y—  and  y^\  where  the  slope  of  the  tradi¬ 
tional  model  has  the  incorrect  sign,  because  the  small  number  of  replications  leads  to  very  noisy  estimates  of 
the  system  time  that  indicate  a  negative  trend.  The  direct  gradient  estimates  (also  based  on  the  small  number 
of  replications),  however,  provide  critical  additional  information  that  leads  to  each  of  the  DiGAR  models 
capturing  the  correct  orientation  of  the  curve.  Similar  results  can  be  observed  for  higher-order  functional 
forms,  e.g.,  quadratic  (see  Fu  and  Qu  [18]). 

Preliminary  theoretical  and  experimental  results  such  as  these  and  others  in  Fu  and  Qu  [18]  arc  highly 
encouraging.  In  our  proposed  research,  we  will  use  generalized  least  squares  (GFS)  to  handle  correla¬ 
tions  and  heteroscedasticity.  We  assume  the  residuals  have  zero  mean,  i.e.,  E(e)  =  0,  so  E{ y)  =  X/J,  and 
Cov(£)  =  V,  where  the  covariance  matrix  V  is  non-diagonal  due  to  the  correlations  between  y,  and  gj.  The 
generalized  least  squares  estimator  is 

J3  =  (XTV-1X)-1XTV-1y, 

and  the  covariance  matrix  for  [5  is 

Cov(j3)  =  (X’^X)"1. 

If  the  residuals  arc  assumed  to  be  normally  distributed,  the  MFE  of  J3  is  the  same  as  the  GFS  estimator.  To 
make  the  analysis  of  the  slope  estimator  tractable,  we  assume  that  y,-  is  correlated  with  gj  only  when  i  =  j 
and  corr (y,-,gi)  =  p,  i  =  1,2,  ■  ■  •  ,k.  Under  these  assumptions,  the  variance  of  j3f  is  given  by 

Var(/3]D)  = -  °2  - ,  (10) 

If  0  <  a2  <  oo,  0  <  <  oo  and  — 1  <  p  <  1,  then  we  can  show  that  Var(/3]D)  in  (10)  is  smaller  than 

Var(j3]L).  Generally,  p,  and  more  generally  V,  is  unknown  and  must  be  estimated  based  on  data.  Although 
the  theoretical  analysis  indicates  potential  for  variance  reduction  from  using  the  correlated  DiGAR  model, 
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Figure  1 :  Expected  time  in  system  for  each  customer.  For  y*4i  and  y(4j.  the  slope  of  the  standard  regression 
model  has  the  incorrect  sign  (negative  rather  than  positive). 


the  extra  computational  budget  spent  on  accurately  estimating  correlations  must  be  traded  off  with  any 
potential  performance  gains. 

2.2.2  Gradient-Enhanced  Stochastic  Kriging  (GESK) 

Stochastic  kriging  (SK)  was  introduced  by  Ankcnman,  Nelson,  and  Staum  [1]  to  handle  the  stochastic 
simulation  setting,  where  the  noise  in  the  fitted  curve  could  be  viewed  as  coming  from  both  uncertainty  in 
the  fit  and  stochastic  nature  of  the  underlying  system.  Thus,  unlike  in  regular  (deterministic)  kriging,  the 
fitted  curve  in  stochastic  kriging  need  not  go  through  every  data  point,  making  it  closer  to  regression  rather 
than,  e.g.,  spline  interpolation.  In  the  same  spirit  as  DiGAR  and  the  second  direction  we  propose  here,  Chen, 
Ankenman,  and  Nelson  [13]  introduced  stochastic  kriging  with  gradient  estimators  (SKG),  showing  that 
SKG  provides  better  prediction  than  ordinary  SK,  in  the  sense  of  smaller  mean  squared  error  (MSE).  SKG 
is  similar  to  cokriging  used  in  deterministic  simulations  and  directly  differentiates  correlation  functions. 
The  approach  introduced  in  Qu  and  Fu  [32,  31]  and  summarized  here  is  fundamentally  different,  because 
it  generates  a  set  of  completely  new  data  points  rather  than  improving  the  estimated  fit  at  the  originally 
provided  points. 

Given  an  experiment  design  (x,-,  nfj,  i  =  1 , 2,  •  •  ■  , k,  stochastic  kriging  models  the  simulation  output  yy(x,) 
from  jth  replication  at  design  point  x,  as: 

y/(x,)  =  f(x,)7  J3  +  M(x,-)  +£;-(x;), 
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where  f(x,)  £  W  with  known  functions  of  x,-,  J3  £  with  unknown  parameters  to  be  estimated,  M  is  a 
realization  of  a  zero-mean  random  field.  The  trend  term  f(x,-)7  j3  represents  the  overall  surface  mean  and  the 
measurement  error  is  denoted  as  £7(x,-).  The  uncertainties  in  M  and  Ej  are  referred  as  extrinsic  and  intrinsic 
uncertainties,  respectively.  Denote  the  sample  mean  of  response  output  and  the  average  simulation  noise  at 

x,-  as 

yfc)  =  -  E  yA*i)>  £(x0  =  -  E  eAx0> 

n‘  j= 1  n‘  j= 1 


with  y  =  (y(xi),y(x2),  •  •  •  ,y(x^))T . 

Suppose  we  want  to  predict  the  response  y(xo)  at  xo-  Let  Em  be  the  kxk  covariance  matrix  implied 
by  the  random  field  M  and  Ee  be  the  kxk  covariance  matrix  implied  by  the  simulation  noise  across  all  de¬ 
sign  point  xi,X2, •  •  •  ,Xfc.  Let  £m(xo,  •)  =  (Cov(y(xo),y(xi),  •  •  •  ,Cov(y(xo)  ,y(xk))T  denote  the  covariances 
between  y(xo)  and  the  responses  from  all  design  points.  Also,  let  F  =  (f(xi),f(x2),  ■  •  •  . f'(x/())  be  the  design 
matrix.  The  MSE-optimal  predictor  is  of  the  form 

y(x0)  =  f(xo)7j3+£M(xo,-)7'(^M+Sr)_1(y-Frj3),  (11) 

and  the  optimal  MSE  is 


MSE(y(x0))  =  £m(xOixo)  —  £m(xo>')T  Pm  +£e]  (12) 

In  an  enhanced  data  setting,  we  observe  the  responses  y7  { x, )  and  the  gradient  estimates  g;-(x,-)  for  the  /th 
simulation  replication  at  design  points  x,-.  Instead  of  modeling  the  gradient  estimates  with  the  partial  deriva¬ 
tive  of  the  random  field  M  as  in  [13],  we  model  g7(x,-)  as  a  noise  measurement  of  the  true  gradient  g(x,-),  i.e., 
g j(x.j)  =  g(x/)  +  Sj(xj).  Denote  the  sample  mean  of  gradient  estimates  and  the  average  simulation  noise  at 


Xj  as 


g(x')  =  “E§i(x')’  *(x«0 

' '  ;= i 


Notice  that  the  response  and  the  gradient  estimates  are  noisy  and  usually  coiTelated,  and  we  assume  that 
S(xj)  is  independent  of  the  random  field  M. 

To  incorporate  gradient  estimates  into  stochastic  kriging,  we  extrapolate  in  the  neighborhood  of  the 
original  design  points  {x,-},  i  =  1,2,  ■  ■  ■  ,k,  i.e.,  additional  response  data  is  generated  via  linear  extrapolations 
using  the  gradient  estimates  as  follows: 

x+  =  x;  +  Ax,-,  y+(xf )  =  y(x/)  +  g(x/)  •  Ax,-.  (13) 

Different  extrapolation  techniques  can  be  applied  in  (13),  and  we  can  also  add  multiple  points  to  the  neigh¬ 
borhood  of  Xj.  In  this  preliminary  study  we  assume  that  the  same  step  size  is  used  for  all  design  points,  i.e., 
Ax,  =  Ar,  /  =  1 .2.  •  •  •  .k.  We  also  assume  that  only  one  additional  point  is  added  in  the  neighborhood  of 
Xj.  Let  }’i  =  y(xj )  and  y/  =  y(xf)  for  simplicity  and  y*  be  the  2k  x  1  vector  containing  all  of  the  original 
response  outputs  and  the  additional  response  outputs  in  (13): 

y*  =  (yun,  -  ■  •  ,yk',yt  ,yt  ,■  ■  ■  ,y£)- 

Similarly,  x+  is  defined  as 

x+  =  (xi,x2,-“  ,xk;x+,x+  ,•••  ,x+). 


To  tit  this  augmented  dataset  into  the  stochastic  kriging  approach,  we  model  the  additional  points  similar  to 
the  original  response  output,  i.e., 


y+(xf)  =  f(x+)Tp  +  M(x+)  +  £+(x+), 
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and  the  variance  of  the  noise  £+(x+ )  and  the  covariance  between  £+(x+ )  and  £(x,)  arc  approximated  by 


Var(£+(x+))  =  Var (£(x,-))  +  (Ax)2tr  Cov(5(x,-))  +  2(Ax)lrCov  ^£(x,-),5(: 
Cov  (£+(x^),£(x,))  =  Var(£(x,-))+ArlrCov  (£(x(),  S(xi)')  . 


k,  and  EL  be  a  2k  x  2k  covariance 


Let  E2M  =  Cov[M(x;),M(x+)],E+  =  Cov[M(x;+),  M(x+)], /,  j  =  1,2, 
matrix  across  all  the  original  design  points  and  additional  design  points,  which  takes  the  form 

TSm 


Similarly,  let 


E* _ 

M  — 


y  *  _ 

*-£  ~ 


V: 


vf  ' 


'MJ 


y  yt 
Ep  E+ 


where 


Ej  =  diag{Cov(£+(x+),£(xi))  ,•••  ,Cov(£+(x+),£(xjt))}, 

E+  =  diag{Var(£+(x+))  ,•••  ,Vai-(£+(x+))}. 

Let  E^|(xo,  •)  be  the  covaiiance  betweeny(xo)  and  all  2k  design  points.  Also,letF*  =  (f(xi),f(x2),  •  •  •  ,f(x^), 
f(xj"),f(x^), •  •  •  ,f(x^))r  be  the  design  matrix.  Under  the  enhanced  data  setting,  we  can  easily  find  the 
MSE-optimal  predictor  and  the  corresponding  MSE  by  substituting  y*,  F*,  EM (xo,  •),  EM  and  E*  for  y,  F, 
Em  (xo,  •),  Em  and  Ee  in  (11)  and  (12),  respectively. 

The  random  field  M  is  assumed  to  be  second-order  stationary,  i.e., 

EM(xj,Xy)  =  v 2R(xi-xj;6), 

where  r2  =  Var  [M  (x)]  and  /Ax,  —  xj;  d)  is  a  correlation  function  with  parameter  6  depending  on  the  distance 
between  x(-  and  xj.  The  extended  covariance  matrix  EM  follows  the  same  correlation  structure,  and  the 
parameters  (z2.6 )  and  J3  can  be  estimated  from  maximum  likelihood  estimators  (MLEs)  provided  that  E* 
is  known. 


Illustrative  Numerical  Example 

Here,  we  consider  a  stylized  example  adopted  from  [39]  where  direct  gradient  estimates  are  just  assumed  to 
be  available  in  the  canonical  form: 

y(x)  =  exp(— 1.4x)cos(77rx/2) +  £,  — 2<x<0,  £~^L(0,1),  g(jc)  =y'(x)  +  5,  <5~o/L( 0,25). 
This  is  just  an  exponentially  damped  sinusoidal  function  with  added  noise.  A  Gaussian  correlation  function 
R(x,x')  =  exp{0(x  —  x')2}  is  used  for  the  kriging.  Four  experiments  will  be  used  to  illustrate  some  charac¬ 
teristics  of  SK,  SKG,  and  GESK,  with  respect  to  the  choice  of  design  points  and  number  of  replications. 

In  the  upper  left  graph  in  Figure  2  we  plot  predictions  using  six  equally  spaced  design  points  based  on 
20  samples  at  each  point.  Because  GESK  is  able  to  explore  the  design  space  more  via  extrapolation,  it  does 
a  better  job  of  capturing  the  fluctuations  of  the  response  surface.  When  the  number  of  points  is  increased 
to  eight  (again  equally  spaced,  with  20  samples  per  point),  which  in  addition  to  being  a  larger  number  of 
points  also  happens  to  place  the  points  in  more  critical  parts  of  the  design  space,  all  three  of  the  approaches 
arc  able  to  do  a  decent  job  of  capturing  the  true  shape,  as  indicated  in  the  upper  right  graph. 

In  the  next  experiment  we  increase  the  number  of  replications  by  an  order  of  magnitude  (from  20  to  200 
per  each  of  the  eight  design  points),  which  surprisingly  leads  to  a  dramatically  worse  fit  for  SK,  as  shown  in 
the  lower  left  graph.  The  last  experiment  then  increases  the  number  of  points  again,  this  time  to  20  (with  200 
replications  per  design  point),  at  which  point  both  SK  and  SKG  perform  poorly,  whereas  GESK  continues 
to  provide  an  improved  fit.  Again,  because  the  location  of  the  20  points  is  not  ideally  suited  to  the  shape  of 


11 


Figure  2:  Test  function:  y{x)  =  exp(— 1.4.r)cos(7 nx/2)  +  e,  using  equally  spaced  design  points; 

top  row  20  samples  per  design  point;  bottom  row  200  samples  per  design  point;  SK  and  SKG  have  poor  fits 

when  design  points  not  well  placed  (cf.  top  left  and  bottom  right),  whereas  GESK  robust  to  placement. 


the  curve,  SKG  is  lead  astray,  whereas  GESK  is  still  able  to  capture  the  shape  due  to  the  additional  points 
generated  that  can  serve  to  compensate. 

This  simple  numerical  example  indicates  how  both  GESK  and  SKG  can  dramatically  improve  the  meta¬ 
model  fit  over  ordinary  stochastic  kriging. 

2.3  Simulation-Based  and  Sampling  Methods  for  Markov  Decision  Processes 

We  define  an  MDP  {XtJ  =  0, 1,...,T}  on  state  space  5?  and  action  space  sd  (cf.  e.g.,  [3,  9]).  In  period 
(stage)  i,  the  MDP  in  state  Xt  E  5?  takes  action  a\  £  ,f/,  incurs  cost  to,-),  where  ft),-  denotes  the 

stochastic  element  (e.g.,  random  number),  and  then  transitions  according  to 

X;  ■  |  —  f[  ■  |  (Xl ,  Clf ,  ft)/)  , 

where  fi(x,a,-)  denotes  the  (stochastic)  transition  function  in  period  i  for  action  a  taken  in  state  x  For 
notational  simplicity,  we  have  not  made  state  and  action  spaces  period  dependent. 

The  objective  is  to  find  a  feedback  control  policy  K  =  ,  which  is  a  sequence  of  decision  rules 
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specifying  the  action  a,-  taken  when  in  state  x  in  period  /,  that  minimizes  an  expected  cost  function,  usually 
either  finite  horizon  total  cost,  finite  horizon  discounted  total  cost,  infinite  horizon  average  cost,  or  infinite 
horizon  discounted  total  cost.  While  we  will  consider  all  of  these  types  of  cost  criteria  in  our  research,  in  this 
proposal  we  focus  on  the  discounted  total  cost  setting,  both  finite  and  infinite  horizon;  for  the  risk-neutral 
case,  we  define  the  value  function  associated  with  a  policy  and  initial  state: 

Vn{x)  =  E 

where  a  is  the  (one -period)  discount  factor  and  T  could  be  infinite,  under  the  assumption  that  the  limit 
is  then  well  defined.  As  stated  earlier,  the  chief  context  is  the  setting  in  which  simulation  is  required  to 
generate  the  system  dynamics  (state  transitions)  and/or  period  costs. 

We  begin  by  defining  some  familial-  quantities: 

Qi(x.  a)  =  (expected)  cost-to-go  (Q-function)  in  period  i  for  action  a  taken  in  state  x 
and  optimal  actions  taken  henceforth; 

Vi(x)  =  optimal  value  function  in  period  i  for  state  x. 

Then  we  have  the  usual  Bellman  optimality  equation  [3,  30]: 

V,-(x)  =  inf  {E[Ci(x1a,coi)  + aVi+i(fi+i(x, a, (Oi))}} ,  (15) 

a 

written  here  in  two-part  form: 

Qi(x,a )  =  E  [Ci(s,a,Q)i)  +  aVi+i(fi+1(x,a,G)j))] ,  (16) 

Vi(x)=wfQi(x,a).  (17) 

a 

An  optimal  policy  in  period  i  will  be  denoted  by 

7T;*(x)  E  arginf<2i(x,a),  i  =  0,...,T  —  1,  x  £  5? .  (18) 

a 

When  the  policy  is  stationary,  the  subscript/argument  i  will  be  dropped.  In  the  infinite  horizon  stationary 
case,  (15)  takes  the  following  form: 

V  (x)  =  inf  {i?  [C(x,a,  to)  +  aV  (fix, a, «))]} ,  (19) 

a 

and  we  will  assume  there  exists  an  optimal  stationary  policy  such  that 

TZ*  (x)  G  arginfQ(x,a),  x  £  5? . 

a 

Traditional  methods  of  policy  iteration,  value  iteration,  and  variants  based  on  linear  programming  all  suffer 
from  the  curse  of  dimensionality.  Furthermore,  the  transition  function  /,•  is  generally  not  known  in  closed 
form  (note  that  in  traditional  MDP  formulations,  it  is  expressed  in  terms  of  explicit  transition  probabilities 
assumed  given),  but  may  be  generated  by  a  complicated  stochastic  simulation  model,  so  in  such  a  setting, 
the  traditional  methods  are  not  directly  applicable. 

2.3.1  Non-Convex  Dynamic  Measures 

In  recent  years,  with  particular  motivation  from  the  field  of  mathematical  finance,  many  new  approaches  to 
the  incorporation  of  risk  into  decision  making  have  been  developed.  One  well-studied  approach  involves 
coherent  or  convex  risk  measures  (cf.  [2,  17]);  here,  we  briefly  describe  some  of  the  work  of  Ruszczynski 
and  colleagues  in  applying  this  approach  to  multi-period  problems  [38,  6].  In  this  work,  expected  value 
operators  are  replaced  by  more  general  risk  measures.  It  is  shown  that  time-consistent  dynamic  risk  mea¬ 
sures  can  be  written  recursively  in  terms  of  one-step  conditional  risk  measures;  furthermore,  these  one-step 
conditional  risk  measures  are  assumed  to  satisfy  properties  of  coherent  measures  of  risk.  Examples  of  one- 
step  conditional  risk  measures  satisfying  these  conditions  are  mean-semideviation  models  and  conditional 
Value-at-Risk  (cf.  Rockafellar  and  Uryasev  [35,  36]).  These  risk  measures  are  applied  to  controlled  Markov 
processes  by  defining  Markov  risk  measures. 


t- t 

£  a'CiiXi.a^CDi)  X0=. 

i= 0 
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To  motivate  the  form  of  these  risk  measures,  consider  the  cost  function  (14)  with  a  =  1  and  where  C,  is 
only  a  function  of  X,  and  a,.  Using  the  tower  property  of  conditional  expectation,  Vn(X o)  can  be  written  as 

yjr(Zo)  =  Co(Xo)ao)+Po(C1(Z1,a1)  +  p1(C2(X2,a2)  +  ---  +  pT_2(Cr-i(Xr_1,ar-i))---)),  (20) 

where,  for  any  function  h, 

Pt  (h(Xt+ 1 ,  at+i ))  =  E  [h(Xt+i,  at+\ )  |X,] . 

The  work  of  Ruszczynski  and  colleagues  generalizes  the  cost  function  (20)  to  more  general  Markov  risk 
measures  satisfying  a  number  of  properties.  Under  appropriate  hypotheses,  value  functions  arc  defined  and 
dynamic  programming  equations  arc  shown  to  lead  to  optimal  Markov  policies.  Discounted  infinite -horizon 
problems  and  undiscounted  transient  problems  arc  also  studied,  along  with  their  corresponding  analogs  of 
value  iteration  and  policy. 

Cumulative  Prospect  Theory  (CPT) 

Prospect  theory  was  proposed  in  the  1970s  by  Kahneman  and  Tversky  to  model  observed  behavior  in  human 
decision  making  that  could  not  be  adequately  explained  by  existing  utility  theory.  Although  prospect  theory 
did  a  better  job  of  explaining  the  experimental  data,  and  served  as  the  basis  on  which  the  Nobel  Prize 
was  awarded  to  Kahneman  in  2002  (Tversky  had  already  passed  away),  there  were  still  some  theoretical 
deficiencies  that  led  to  various  proposed  alternatives  to  and  extensions  of  prospect  theory.  Cumulative 
prospect  theory  (CPT),  also  developed  by  Kahneman  and  Tversky  [41],  posits  a  utility  function  that  has 
a  reference  point  against  which  gains  and  losses  arc  measured,  and  is  concave  on  gains  and  convex  on 
losses  (i.e.,  horizontal  S-shape),  along  with  a  probability  weighting  function  that  transforms  the  probability 
measure  such  that  a  small  probability  is  inflated  and  a  large  probability  is  deflated. 

Based  on  the  special  utility  and  weighting  functions,  a  CPT  performance  measure  can  be  characterized. 
A  CPT  performance  measure  has  the  following  form: 

poo  poo 

p(R)  =  I  w+  (P  (m+  ((/?  —  5)+)  >  ?))  dt  —  J  w~  (P  (w_  {{R  —  B)_)  >  ?))  dt,  (21) 

where  w+  :  [0, 1]  ->[0,1]  and  w~  :  [0, 1]  — >  [0, 1]  are  two  continuous  non-decreasing  functions.  u+  :  M+  — >  M+ 
and  u  :  M+  — >  M+  are  two  utility  functions,  and  the  random  variable  B  represents  the  reference  point  against 
which  the  performance  is  measured. 

The  key  to  understanding  Equation  (21)  is 

[  w (P  (u(x)  >  t)) dt  =  I  u(x)d  [w(Fx  (*))] ,  (22) 

JO  Ju(x)>  0 

so  that  the  CPT  performance  measure  can  be  interpreted  as  the  sum  of  the  distorted  expected  gains  and 
losses.  As  mentioned  earlier,  CPT  and  other  related  approaches  better  explain  empirically  observed  human 
behavior  in  many  cases.  One  way  to  understand  these  approaches  is  the  following:  recall  that  an  expected 
utility  assigns  a  utility  to  each  outcome,  and  then  weights  each  outcome  according  to  its  probability  of 
occurrence.  In  CPT,  not  only  are  the  outcomes  transformed  using  the  utility  function,  but  the  probabilities 
by  which  they  arc  weighted  arc  also  transformed  using  the  weighting  function. 

Probabilistic  Sensitivity 

Combining  a  number  of  other  approaches,  including  both  expected  utility  maximization  and  CPT,  He  and 
Zhou  [19]  proposed  a  method  of  risk-averse  optimization  via  quantiles.  At  the  heart  of  the  unification  is 
the  incorporation  of  the  idea  of  probabilistic  sensitivity  into  risk  measures,  which  is  a  source  of  technical 
challenges.  Overcoming  these  challenges,  He  and  Zhou  restrict  their  investigation  to  the  drift-diffusion  case, 
which  can  be  tackled  using  the  martingale  approach.  Probabilistic  sensitivity  replaces  the  (probabilistic) 
lineality  of  expected  utility  theory  with  more  expressive  but  also  more  challenging  nonlinear  weighting 
functions.  Because  of  this  modification,  the  resulting  stochastic  optimal  control  problems  do  not  enjoy  the 
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desirable  properties  of  time-consistency  and  convexity,  which  lead  to  fresh  challenges  in  applying  dynamic 
programming.  Our  proposed  research  builds  upon  ideas  from  this  work  in  the  MDP  setting. 

Research  Results 

The  class  of  coherent  risk  measures  satisfies  several  properties,  one  of  which  is  convexity.  In  many  cases, 
the  convexity  requirement  is  too  strong  for  practical  applications.  Specifically,  the  CPT  risk  measures, 
which  have  nonlinear  probabilistic  sensitivity,  arc  not  convex,  but  empirical  evidence  provides  support  of 
their  use  for  predicting  human  decision-making  processes.  The  class  of  CPT-inspired  measures  bestows 
upon  practitioners  the  flexibility  of  choosing  gain/loss  measure  distortion  functions,  which  is  important 
in  expressing  risk.  This  alternative  approach  of  representing  risk-sensitivity  using  probabilistic  sensitivity 
sets  CPT-inspired  performance  measures  apart  from  the  existing  risk-sensitive  measures.  A  CPT-inspired 
performance  measure  can  be  used  as  the  one-step  conditional  risk  measure  pf  (•)  (i.e.,  condition  on  the 
knowledge  at  time  t)  in  Equation  (20). 

As  noted  before,  we  arc  interested  in  a  broader  class  of  risk-sensitive  measures  than  expected  utility; 
however,  we  still  need  to  impose  some  conditions  on  the  performance  measure  such  that  dynamic  program¬ 
ming  is  still  applicable.  We  focus  on  a  class  of  risk-sensitive  measures  that  contains  a  subset  of  CPT-inspired 
risk  measures.  The  members  of  this  class  of  risk-sensitive  measures  satisfy  the  following  conditions. 

Assumption  3.  The  one-step  conditional  risk  measure,  p,  (•),  satisfies  the  following  conditions: 

1.  IfZ  <  W  then  pt{Z)  <  pt(W),  VZ, W  G  2z ft+1; 

2.  p,(J3Z)  =  ppt(Z),  VZ  G  >  0, 

where  jZJ+i  is  the  space  oft  +  1  measurable  and  integrable  functions. 

A  desirable  property  of  the  class  of  risk  measures  satisfying  Assumption  3  is  that  it  include  expected  utility 
and  coherent  risk  measures.  As  we  will  see,  this  class  also  contains  a  large  subset  of  CPT-inspired  risk 
measures.  We  denote  the  space  of  probability  measures  over  A  by  FZ  (A)  and  the  state  transition  probability 
measure  by  Q,{-\x,a).  The  first  result  presents  the  optimality  criteria  for  the  class  of  risk  measures  satisfying 
Assumption  3.  (The  proofs  of  all  theorems  in  this  section  can  be  found  in  [28].) 

Theorem  1.  Assume  Assumption  3  and  the  following  conditions  hold: 

1)  Vx  G  .Z,  the  stochastic  kernels  Qtx  :  a  —>■  Qt(-\x,a)  are  continuous. 

2)  The  one-step  dynamic  risk  measure  {Pi}js^  is  Markov,  and  3  a  sequence  of  corresponding  risk 
transition  mappings  <7, :  m  -G  O { y/,x,m),  I  =  0. ....  T  —  I  that  are  lower  semi-continuous. 

3 )  The  functions  {C,  (•,  •,  -)}J=q  are  bounded,  measurable,  and  a  — >  Ct(-,a,-)  is  lower  semi-continuous. 

4)  \/x  G  2Z  and  t  G  [0, . . . ,  T  —  1]  the  set  At(x)  is  compact. 

5)  The  function  Ct  is  bounded  and  measurable. 

Then  a  minimizer  for  the  dynamic  programing  equations, 

vt(x)  =  min  a,  (Ct(x,  ■,  ■)  +  v,+i(-) ,x,  8  o  Qt  x) 

SetZ(A{x)) 

vT  (jc)  =  CT  (x)  XGJZ,  t  =  1,...,T-1,  (23) 

exists.  Furthermore,  an  optimal  policy,  n*  =  {7To,...,7Ty_1},  exists  and  each  n*x  is  a  minimizer  for  the 
right-hand  side  of  Equation  (23).  In  addition,  every  measurable  solution  of  Equation  (23)  at  time  0,  vo,  is 
an  optimal  solution  for  Equation  (20). 

In  Theorem  1 ,  the  assumptions  arc  standard  with  the  exception  of  the  second.  In  the  second  assumption, 
pt  (•)  is  required  to  be  Markov,  which  means  it  can  be  expressed  as  a  function  of  the  risk,  the  current  state, 
and  the  transition  probability  (i.e.,  p,  (z)  =  <T  (. z,x,m ),  where  x  is  the  current  state,  and  m  is  the  transition 
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probability).  Other  more  technical  requirements  (i.e.,  boundedness)  for  a  conditional  risk  measure  to  be 
Markov  have  been  omitted  here. 

Theorem  1  provides  the  optimality  criteria  for  a  class  of  dynamic  risk  measures.  The  next  goal  is  to 
identify  the  subset  of  CPT-inspired  risk  measures  that  belongs  to  this  class,  i.e.,  satisfy  Assumption  3  and 
the  second  assumption  of  Theorem  1 .  The  following  result  provides  conditions  on  the  gain/loss  probability 
weighting  functions  such  that  the  corresponding  CPT  risk  measure  belongs  to  the  class  of  risk  measures 
satisfying  Assumption  3;  hence,  it  is  suitable  for  dynamic  programming. 

Theorem  2.  If  w+  and  w~  are  continuous  and  monotonically  non-decreasing  functions  in  Equation  ( 21 ), 
then  the  assumptions  in  Theorem  1  are  satisfied;  hence,  dynamic  programming  is  applicable. 

Since  the  requirement  on  these  probability  weighting  functions  is  minimal  (i.e.,  continuity  and  mono¬ 
tonicity),  the  class  of  CPT-inspired  measures  is  large.  Examples  of  popular  probability  weighting  functions 
and  their  empirical  support  can  be  found  in  existing  literature  from  the  decision  theory  community.  The 
finite-horizon  case  can  be  extended  to  the  infinite-horizon  transient  case  via  assumptions  such  as  k-step 
contraction  and  the  Markov  model  being  uniformly  transient,  as  shown  in  [28]. 

2.3.2  Additional  Results  on  Simulation-Based  and  Sampling  Methods  for  Markov  Decision  Processes 

In  a  simulation-based  approach  to  MDPs,  we  have  developed  in  [20]  a  simulation-based  algorithm  called 
Approximate  Stochastic  Annealing  (ASA)  for  solving  finite  horizon  MDPs.  The  algorithm  iteratively  esti¬ 
mates  the  optimal  policy  by  sampling  from  a  sequence  of  probability  distribution  functions  over  the  policy 
space.  By  exploiting  a  novel  connection  of  ASA  to  the  stochastic  approximation  method,  we  show  that  the 
sequence  of  distribution  functions  generated  by  the  algorithm  converges  to  a  degenerate  distribution  that 
concentrates  only  on  the  optimal  policy.  Numerical  examples  arc  also  provided  to  illustrate  the  algorithm. 

In  a  supply  chain  application,  we  consider  in  [34]  a  make-to-order  business  that  serves  customers  in 
multiple  priority  classes.  Orders  from  customers  in  higher  classes  bring  greater  revenue,  but  they  expect 
shorter  lead  times  than  customers  in  lower  classes.  In  making  lead  time  promises,  the  firm  must  recognize 
preexisting  order  commitments,  uncertainty  over  future  demand  from  each  class,  and  the  possibility  of 
supply  chain  disruptions.  We  model  this  scenario  as  an  MDP  and  use  reinforcement  learning  to  determine 
the  firms  lead-time  policy.  In  order  to  achieve  tractability  on  large  problems,  we  utilize  a  sequential  decision 
making  approach  that  effectively  allows  us  to  eliminate  one  dimension  from  the  state  space  of  the  system. 
Initial  numerical  results  from  the  sequential  dynamic  approach  suggest  that  the  resulting  policies  more 
closely  approximate  optimal  policies  than  static  optimization  approaches. 

Portfolio  Selection  as  introduced  by  Harry  Markowitz  laid  the  foundation  for  Modern  Portfolio  Theory. 
However,  the  assumption  that  underlying  asset  returns  follow  a  normal  distribution  and  that  investors  are 
indifferent  to  skew  and  kurtosis  is  not  practically  suited  for  the  hedge  fund  environment.  Additionally, 
the  lockup  and  notice  provisions  built  into  hedge  fund  contracts  make  portfolio  rebalancing  difficult  and 
justify  the  need  for  dynamic  allocation  strategies.  Market  conditions  are  dynamic;  therefore,  rebalancing 
constraints  in  the  face  of  changing  market  environments  can  have  a  severe  impact  on  return  generation. 
There  is  a  need  for  sophisticated  yet  tractable  solutions  to  the  multi-period  problem  of  hedge  fund  portfolio 
construction  and  rebalancing.  In  [14],  we  generalize  the  hedge  fund  asset  return  distribution  to  a  multivariate 
K-mean  Gaussian  mixture  distribution;  model  the  multi-period  hedge  fund  allocation  problem  as  a  Partially 
Observable  Markov  Decision  Process  (POMDP);  and  propose  practical  rebalancing  strategies  that  represent 
a  convergence  of  literature  on  hedge  fund  investing,  regime  switching,  and  dynamic  portfolio  optimization. 

In  an  application  to  semiconductor  manufacturing,  we  present  in  [33]  the  architecture  and  implementa¬ 
tion  of  a  preventive  maintenance  optimization  software  tool  (PMOST),  based  on  algorithms  for  the  optimal 
scheduling  of  preventive  maintenance  (PM)  tasks  in  semiconductor  manufacturing  operations.  We  also 
present  results  from  four  complex  simulation  case  studies,  based  on  real  industrial  data  and  employing  full 
fab  models,  to  illustrate  the  use,  data  needs  and  outcomes  produced  by  PMOST.  These  results  demonstrate 
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significant  improvements  in  tool  production  and  consolidation  of  PM  tasks.  We  give  a  description  of  the 
different  software  modules  that  compose  PMOST,  to  provide  guidelines  as  well  as  a  template  for  other 
implementations  of  the  PM  optimization  algorithms  utilized  by  PMOST. 
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Scientific  Computing,  Montreal,  Canada,  July  6-11,  2008. 
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53.  E.  Zhou,  M.C.  Fu,  and  S.I.  Marcus,  A  Particle  Filtering  Framework  for  Simulation-Based  Optimiza¬ 
tion,  Proc.  2008  INFORMS  Annual  Meeting,  Washington,  DC,  Oct.  12-15,  2008. 

54.  E.  Zhou,  M.C.  Fu,  and  S.I.  Marcus,  A  Particle  Filtering  Framework  for  Randomized  Optimization 
Algorithms,  Proc.  2008  Winter  Simulation  Conference,  Miami,  FF,  Dec.  7-10,  2008. 

55.  E.  Zhou,  M.C.  Fu,  and  S.I.  Marcus,  A  Density  Projection  Approach  to  Dimension  Reduction  for 
Continuous-State  POMDPs,  Proc.  47th  IEEE  Conference  on  Decision  and  Control,  Cancun,  Mexico, 
Dec.  9-11,  2008. 

56.  E.  Zhou,  M.C.  Fu,  and  S.I.  Marc  us,  A  Numerical  Method  for  Financial  Decision  Problems  Under 
Stochastic  Volatility,  Proc.  Winter  Simulation  Conference,  Austin,  TX,  Dec.  13-16,  2009. 

57.  E.  Zhou,  M.C.  Fu,  and  S.I.  Marcus  Solving  Continuous-State  POMDPs  via  Density  Projection,  IEEE 
Transactions  on  Automatic  Control,  Vol.  55,  1101-1116,  May  2010. 

58.  E.  Zhou,  M.C.  Fu,  and  S.I.  Marcus,  A  Particle  Filtering  Framework  for  Randomized  Optimization 
Algorithms,  IEEE  Transactions  on  Automatic  Control,  to  appeal-,  2013. 

59.  E.  Zhou  and  J.  Hu,  Gradient  Guided  Adaptive  Stochastic  Search  for  Non-Differentiable  Optimization, 
IEEE  Transactions  on  Automatic  Control,  under  revision,  2013. 

3.2  Awards 

•  Michael  Fu:  Elected  Fellow  of  the  Institute  of  Electrical  and  Electronics  Engineers  (IEEE). 

•  Michael  Fu:  Elected  Fellow  of  the  Institute  for  Operations  Research  and  the  Management  Sciences 
(INFORMS). 

•  Steven  Marcus:  Elected  Fellow  of  the  Society  for  Industrial  and  Applied  Mathematics  (SIAM). 

•  Yongqiang  Wang  (University  of  Maryland  Ph.D.  student):  received  the  INFORMS  Computing  Society 
Student  Paper  Award  2010  for  the  paper  “A  New  Stochastic  Derivative  Estimator  for  Discontinuous 
Payoff  Functions  with  Application  to  Financial  Derivatives.” 

•  Yongqiang  Wang  (University  of  Maryland  Ph.D.  student):  won  the  Best  Student  Paper  Award  for 
“Best  OR/MS-focused  Paper”  at  the  2010  Winter  Simulation  Conference,  for  the  paper  “Model-based 
Evolutionary  Optimization.” 

3.3  Ph.D.  Students  Graduated 

•  Andrew  Hall,  Ph.D.,  2009,  Univ.  of  Maryland,  supervised  by  M.  Fu,  “Simulating  and  Optimizing: 
Military  Manpower  Modeling  and  Mountain  Range  Options”  (currently:  US  Military  Academy,  West 
Point) 

•  Matthew  Reindorp,  Ph.D.,  2009,  Univ.  of  Maryland,  supervised  by  M.  Fu,  “Industrial  Flexibility  in 
Theory  and  Practice”  (currently:  Technical  University  of  Eindhoven) 

•  Abraham  Thomas,  Ph.D.,  2009,  Univ.  of  Maryland,  supervised  by  S.  Marc  us,  “Learning  Algorithms 
for  Markov  Decision  Processes.” 

•  Enlu  Zhou,  Ph.D.,  2009,  Univ.  of  Maryland,  supervised  by  S.  Marcus  and  M.  Fu,  “Particle  Filtering 
for  Stochastic  Control  and  Global  Optimization”  (currently:  Georgia  Tech) 
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•  Ping  Hu,  Ph.D.,  2011,  Stony  Brook,  supervised  by  J.  Hu 

•  Yongqiang  Wang,  Ph.D,  2011,  Univ.  of  Maryland,  supervised  by  M.  Fu  and  S.  Marc  us 

•  Kun  Lin,  Ph.D,  2013,  Univ.  of  Maryland,  supervised  by  S.  Marcus 
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